Deep learning via Hessian-free optimization

نویسنده

  • James Martens
چکیده

We develop a 2nd-order optimization method based on the “Hessian-free” approach, and apply it to training deep auto-encoders. Without using pre-training, we obtain results superior to those reported by Hinton & Salakhutdinov (2006) on the same tasks they considered. Our method is practical, easy to use, scales nicely to very large datasets, and isn’t limited in applicability to autoencoders, or any specific model class. We also discuss the issue of “pathological curvature” as a possible explanation for the difficulty of deeplearning and how 2nd-order optimization, and our method in particular, effectively deals with it.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Preconditioner for Hessian Free Optimization

We investigate the use of Hessian Free optimization for learning deep autoencoders. One of the critical components in that algorithm is the choice of the preconditioner. We argue in this paper that the Jacobi preconditioner leads to faster optimization and we show how it can be accurately and efficiently estimated using a randomized algorithm.

متن کامل

Investigations on hessian-free optimization for cross-entropy training of deep neural networks

Context-dependent deep neural network HMMs have been shown to achieve recognition accuracy superior to Gaussian mixture models in a number of recent works. Typically, neural networks are optimized with stochastic gradient descent. On large datasets, stochastic gradient descent improves quickly during the beginning of the optimization. But since it does not make use of second order information, ...

متن کامل

Saddle-free Hessian-free Optimization

Nonconvex optimization problems such as the ones in training deep neural networks suffer from a phenomenon called saddle point proliferation. This means that there are a vast number of high error saddle points present in the loss function. Second order methods have been tremendously successful and widely adopted in the convex optimization community, while their usefulness in deep learning remai...

متن کامل

Document for “ On Optimization Methods for Deep Learning ”

In our ICML paper titled “On optimization methods for deep learning”, we discussed the standard and sparse autoencoder model. However, due to space limitations in our paper, we were not able to present further details about the bases learned by the sparse autoencoder model, compare the standard autoencoder with the Hessian Free approach as described in (Martens, 2010) and analyze in detail the ...

متن کامل

Block-diagonal Hessian-free Optimization for Training Neural Networks

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are rarely applied to deep learning in practice because of high computational cost and the need for model-dependent algorithmic variations. We introduce a variant of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010